|
In the field of computational biology, a planted motif search (PMS) also known as a (''l, d'')-motif search (LDMS) is a method for identifying conserved motifs within a set of nucleic acid or peptide sequences. PMS is known to be NP-complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size and ''l''. The PMS problem was first introduced by Keich and Pevzner. The problem of identifying meaningful patterns (e.g., motifs) from biological data has been studied extensively since they play a vital role in understanding gene function, human disease, and may serve as therapeutic drug targets. == Description == The search problem may be summarized as follows: ''Input are n strings (s1, s2, … , sn) of length m each from an alphabet Σ and two integers l and d. Find all strings x such that |x| = l and every input string contains at least one variant of x at a Hamming distance of at most d. Each such x is referred to as an (l, d) motif.'' For example, if the input strings are GCGCGAT, CACGTGA, and CGGTGCC; ''l'' = 3 and ''d'' = 1, then GGT is a motif of interest. Note that the first input string has GAT as a substring, the second input string has CGT as a substring, and the third input string has GGT as a substring. GAT is a variant of GGT that is within a Hamming distance of 1 from GGT, etc. Call the variants of a motif that occur in the input strings as instances of the motif. For example, GAT is an instance of the motif GGT that occurs in the first input string. Zero or more (''l'', ''d'') motifs are contained in any given set of input strings. Many of the known algorithms for PMS consider DNA strings for which Σ =. There exist algorithms that deal with protein strings as well. The PMS problem is also known as the (''l'', ''d'')-motif search (LDMS) problem. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Planted motif search」の詳細全文を読む スポンサード リンク
|